Acquiring German Prepositional Subcategorization Frames from Corpora
نویسنده
چکیده
This paper presents a procedure to automaticafly learn German prepositional subcategofization frames fzom text corpora. It is based on shallow parsing techniques employed to identify high-accuracy cues for prepositional frames, the EM algorithm to solve the PP attachment problem implicit in the task, and a method to rank the evidence for subcategorization provided by the collected data.
منابع مشابه
The Automatic Acquisition Of Frequencies Of Verb Subcategorization Frames From Tagged Corpora
We describe a mechanism for automatically acquiring verb subcategorization frames and their frequencies in a large corpus. A tagged corpus is first partially parsed to identify noun phrases and then a finear grammar is used to estimate the appropriate subcategorization frame for each verb token in the corpus. In an experiment involving the identification of six fixed subcategorization frames, o...
متن کاملA Subcategorization Acquisition System for French Verbs
This paper presents a system capable of automatically acquiring subcategorization frames (SCFs) for French verbs from the analysis of large corpora. We applied the system to a large newspaper corpus (consisting of 10 years of the French newspaper ’Le Monde’) and acquired subcategorization information for 3267 verbs. The system learned 286 SCF types for these verbs. From the analysis of 25 repre...
متن کاملAutomatic Acquisition of Adjectival Subcategorization from Corpora
This paper describes a novel system for acquiring adjectival subcategorization frames (SCFs) and associated frequency information from English corpus data. The system incorporates a decision-tree classifier for 30 SCF types which tests for the presence of grammatical relations (GRs) in the output of a robust statistical parser. It uses a powerful patternmatching language to classify GRs into fr...
متن کاملUsing Loglinear Clustering for Subcategorization Identification
In this paper we will describe a process for mining syntactical verbal subcategorization, i.e. the information about the kind of phrases or clauses a verb goes with. We will use a large text corpus having almost 10,000,000 tagged words as our resource material. Loglinear modeling is used to analyze and automatically identify the subcategorization dependencies. An unsupervised clustering algorit...
متن کاملAcquiring Syntactic Information for a Government Pattern Dictionary from Large Text Corpora
There are some research lines in automatic subcategorization frame acquisition and the importance of their work could not be doubted. However, almost all automatic work has been done in the constituent approach. Conversely, manual work is the traditional way for syntactic information acquisition in the dependency approach, which considers the correspondence between semantic valences and theirs ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997